Goto

Collaborating Authors

 Montevideo




The Morgan-Pitman Test of Equality of Variances and its Application to Machine Learning Model Evaluation and Selection

arXiv.org Machine Learning

Model selection in non-linear models often prioritizes performance metrics over statistical tests, limiting the ability to account for sampling variability. We propose the use of a statistical test to assess the equality of variances in forecasting errors. The test builds upon the classic Morgan-Pitman approach, incorporating enhancements to ensure robustness against data with heavy-tailed distributions or outliers with high variance, plus a strategy to make residuals from machine learning models statistically independent. Through a series of simulations and real-world data applications, we demonstrate the test's effectiveness and practical utility, offering a reliable tool for model evaluation and selection in diverse contexts.


RETUYT-INCO at BEA 2025 Shared Task: How Far Can Lightweight Models Go in AI-powered Tutor Evaluation?

arXiv.org Artificial Intelligence

In this paper, we present the RETUYT-INCO participation at the BEA 2025 shared task. Our participation was characterized by the decision of using relatively small models, with fewer than 1B parameters. This self-imposed restriction tries to represent the conditions in which many research labs or institutions are in the Global South, where computational power is not easily accessible due to its prohibitive cost. Even under this restrictive self-imposed setting, our models managed to stay competitive with the rest of teams that participated in the shared task. According to the $exact\ F_1$ scores published by the organizers, the performance gaps between our models and the winners were as follows: $6.46$ in Track 1; $10.24$ in Track 2; $7.85$ in Track 3; $9.56$ in Track 4; and $13.13$ in Track 5. Considering that the minimum difference with a winner team is $6.46$ points -- and the maximum difference is $13.13$ -- according to the $exact\ F_1$ score, we find that models with a size smaller than 1B parameters are competitive for these tasks, all of which can be run on computers with a low-budget GPU or even without a GPU.


The Inverse Drum Machine: Source Separation Through Joint Transcription and Analysis-by-Synthesis

arXiv.org Machine Learning

--We present the Inverse Drum Machine (IDM), a novel approach to Drum Source Separation that leverages an analysis-by-synthesis framework combined with deep learning. Unlike recent supervised methods that require isolated stem recordings, our approach operates on drum mixtures with only transcription annotations. IDM integrates Automatic Drum Transcription and One-shot drum Sample Synthesis, jointly optimizing these tasks in an end-to-end manner . By convolving synthesized one-shot samples with estimated onsets, akin to a drum machine, we reconstruct the individual drum stems and train a Deep Neural Network on the reconstruction of the mixture. Experiments on the StemGMD dataset demonstrate that IDM achieves separation quality comparable to state-of-the-art supervised methods that require isolated stems data, while significantly outperforming matrix decomposition baselines. N Western popular music, the rhythmic foundation typically relies on percussion instruments from a standard drum kit comprising kick drum, snare drum, and hi-hat, while additional elements such as cymbals, tom-toms, and auxiliary percussions provide timbral complexity and rhythmic variation. Music producers and engineers often need to adjust individual drum instruments separately for remixing, rebalanc-ing, effects processing, or creating educational materials [1], [2]. Ideally, music production would utilize isolated recordings of each drum instrument (known as "stems"), allowing for precise control during mixing. However, these instruments are usually played simultaneously and by the same performer, resulting in recordings in which all elements are mixed into a single audio stream. Obtaining these separated stems during recording requires multiple microphones (leading to microphone bleeding) or asking musicians to play in unnatural conditions [3]. The need for tools that can extract individual drum stems from already mixed recordings has led to growing interest in Drum Source Separation (DSS). These solutions, however, are proprietary and still have limitations in separation quality and flexibility. DSS is challenging due to the acoustic properties of percussion sounds.


Identifying and Characterising Higher Order Interactions in Mobility Networks Using Hypergraphs

arXiv.org Artificial Intelligence

Human mobility data is crucial for understanding patterns of movement across geographical regions, with applications spanning urban planning[1], transportation systems design[2], infectious disease modeling and control [3, 4], and social dynamics studies [5]. Traditionally, mobility data has been represented using flow networks[6, 7] or colocation matrices [8], where the primary representation is via pairwise interactions. In flow networks, this means directed edges represent the movement of individuals between two locations; colocation matrices measure the probability that a random individual from a region is colocated with a random individual from another region at the same location. These data types and their pairwise representation structure have been used to identify the spatial scales and regularity of human mobility, but have inherent limitations in their capacity to capture more complex patterns of human movement involving higher-order interactions between locations - that is, group of locations that are frequently visited by many individuals within a period of time (e.g., a week) and revisited regularly over time. Higher-order interactions between locations can contain crucial information under certain scenarios.


Neural Combinatorial Optimization for Real-World Routing

arXiv.org Artificial Intelligence

Vehicle Routing Problems (VRPs) are a class of NP-hard problems ubiquitous in several real-world logistics scenarios that pose significant challenges for optimization. Neural Combinatorial Optimization (NCO) has emerged as a promising alternative to classical approaches, as it can learn fast heuristics to solve VRPs. However, most research works in NCO for VRPs focus on simplified settings, which do not account for asymmetric distances and travel durations that cannot be derived by simple Euclidean distances and unrealistic data distributions, hindering real-world deployment. This work introduces RRNCO (Real Routing NCO) to bridge the gap of NCO between synthetic and real-world VRPs in the critical aspects of both data and modeling. First, we introduce a new, openly available dataset with real-world data containing a diverse dataset of locations, distances, and duration matrices from 100 cities, considering realistic settings with actual routing distances and durations obtained from Open Source Routing Machine (OSRM). Second, we propose a novel approach that efficiently processes both node and edge features through contextual gating, enabling the construction of more informed node embedding, and we finally incorporate an Adaptation Attention Free Module (AAFM) with neural adaptive bias mechanisms that effectively integrates not only distance matrices but also angular relationships between nodes, allowing our model to capture rich structural information. RRNCO achieves state-of-the-art results in real-world VRPs among NCO methods. We make our dataset and code publicly available at https://github.com/ai4co/real-routing-nco.


None of the Above, Less of the Right: Parallel Patterns between Humans and LLMs on Multi-Choice Questions Answering

arXiv.org Artificial Intelligence

Multiple-choice exam questions with "None of the above" (NA) options have been extensively studied in educational testing, in which existing research suggests that they better assess true knowledge. However, their impact on Large Language Models (LLMs) evaluation remains underexplored. Through systematic experiments with 28 LLMs on the MMLU benchmark, we examine how NA options affect model performance and confidence calibration. Our analysis reveals that NA options, when used as the correct answer, lead to a consistent 30-50\% performance drop across models regardless of scale--suggesting that LLMs lack the meta-cognitive ability to systematically evaluate and reject all given options when none are correct. This degradation shows strong domain dependence, with minimal impact on mathematical reasoning (14.6\% drop) but severe effects on tasks requiring uncertainty handling like business ethics (48.1\% drop). Our results highlight important implications for benchmark design and raise questions about LLMs' ability to handle uncertainty in real-world applications.


Unveiling AI's Threats to Child Protection: Regulatory efforts to Criminalize AI-Generated CSAM and Emerging Children's Rights Violations

arXiv.org Artificial Intelligence

This paper aims to present new alarming trends in the field of child sexual abuse through imagery, as part of SafeLine's research activities in the field of cybercrime, child sexual abuse material and the protection of children's rights to safe online experiences. It focuses primarily on the phenomenon of AI-generated CSAM, sophisticated ways employed for its production which are discussed in dark web forums and the crucial role that the open-source AI models play in the evolution of this overwhelming phenomenon. The paper's main contribution is a correlation analysis between the hotline's reports and domain names identified in dark web forums, where users' discussions focus on exchanging information specifically related to the generation of AI-CSAM. The objective was to reveal the close connection of clear net and dark web content, which was accomplished through the use of the ATLAS dataset of the Voyager system. Furthermore, through the analysis of a set of posts' content drilled from the above dataset, valuable conclusions on forum members' techniques employed for the production of AI-generated CSAM are also drawn, while users' views on this type of content and routes followed in order to overcome technological barriers set with the aim of preventing malicious purposes are also presented. As the ultimate contribution of this research, an overview of the current legislative developments in all country members of the INHOPE organization and the issues arising in the process of regulating the AI- CSAM is presented, shedding light in the legal challenges regarding the regulation and limitation of the phenomenon.


Edit Once, Update Everywhere: A Simple Framework for Cross-Lingual Knowledge Synchronization in LLMs

arXiv.org Artificial Intelligence

Knowledge editing allows for efficient adaptation of large language models (LLMs) to new information or corrections without requiring full retraining. However, prior methods typically focus on either single-language editing or basic multilingual editing, failing to achieve true cross-linguistic knowledge synchronization. To address this, we present a simple and practical state-of-the-art (SOTA) recipe Cross-Lingual Knowledge Democracy Edit (X-KDE), designed to propagate knowledge from a dominant language to other languages effectively. Our X-KDE comprises two stages: (i) Cross-lingual Edition Instruction Tuning (XE-IT), which fine-tunes the model on a curated parallel dataset to modify in-scope knowledge while preserving unrelated information, and (ii) Target-language Preference Optimization (TL-PO), which applies advanced optimization techniques to ensure consistency across languages, fostering the transfer of updates. Additionally, we contribute a high-quality, cross-lingual dataset, specifically designed to enhance knowledge transfer across languages. Extensive experiments on the Bi-ZsRE and MzsRE benchmarks show that X-KDE significantly enhances cross-lingual performance, achieving an average improvement of +8.19%, while maintaining high accuracy in monolingual settings.